15 research outputs found

    Efficient Diversification of Web Search Results

    Full text link
    In this paper we analyze the efficiency of various search results diversification methods. While efficacy of diversification approaches has been deeply investigated in the past, response time and scalability issues have been rarely addressed. A unified framework for studying performance and feasibility of result diversification solutions is thus proposed. First we define a new methodology for detecting when, and how, query results need to be diversified. To this purpose, we rely on the concept of "query refinement" to estimate the probability of a query to be ambiguous. Then, relying on this novel ambiguity detection method, we deploy and compare on a standard test set, three different diversification methods: IASelect, xQuAD, and OptSelect. While the first two are recent state-of-the-art proposals, the latter is an original algorithm introduced in this paper. We evaluate both the efficiency and the effectiveness of our approach against its competitors by using the standard TREC Web diversification track testbed. Results shown that OptSelect is able to run two orders of magnitude faster than the two other state-of-the-art approaches and to obtain comparable figures in diversification effectiveness.Comment: VLDB201

    The Impact of Novel Computing Architectures on Large-Scale Distributed Web Information Retrieval Systems

    Get PDF
    Web search engines are the most popular mean of interaction with the Web. Realizing a search engine which scales even to such issues presents many challenges. Fast crawling technology is needed to gather the Web documents. Indexing has to process hundreds of gigabytes of data efficiently. Queries have to be handled quickly, at a rate of thousands per second. As a solution, within a datacenter, services are built up from clusters of common homogeneous PCs. However, Information Retrieval (IR) has to face issues raised by the growing amount of Web data, as well as the number of new users. In response to these issues, cost-effective specialized hardware is available nowadays. In our opinion, this hardware is ideal for migrating distributed IR systems to computer clusters comprising heterogeneous processors in order to respond their need of computing power. Toward this end, we introduce K-model, a computational model to properly evaluate algorithms designed for such hardware. We study the impact of K-model rules on algorithm design. To evaluate the benefits of using K-model in evaluating algorithms, we compare the complexity of a solution built using our properly designed techniques, and the existing ones. Although in theory competitors are more efficient than us, empirically, K-model is able to prove because our solutions have been shown to be faster than the state-of-the-art implementations

    QuickRank: a C++ Suite of Learning to Rank Algorithms

    Get PDF
    Ranking is a central task of many Information Retrieval (IR) problems, particularly challenging in the case of large-scale Web collections where it involves effectiveness requirements and effciency constraints that are not common to other ranking-based applications. This paper describes QuickRank, a C++ suite of effcient and effective Learning to Rank (LtR) algorithms that allows high-quality ranking functions to be devised from possibly huge training datasets. QuickRank is a project with a double goal: i) answering industrial need of Tiscali S.p.A. for a exible and scalable LtR solution for learning ranking models from huge training datasets; ii) providing the IR research community with a exible, extensible and effcient LtR framework to design LtR solutions and fairly compare the performance of different algorithms and ranking models. This paper presents our choices in designing QuickRank and report some preliminary use experiences.Ranking is a central task of many Information Retrieval (IR) problems, particularly challenging in the case of large-scale Web collections where it involves eectiveness requirements and eciency constraints that are not common to other ranking-based applications. This paper describes QuickRank, a C++ suite of ecient and eective Learning to Rank (LtR) algorithms that allows high-quality ranking functions to be devised from possibly huge training datasets. QuickRank is a project with a double goal: i) answering industrial need of Tiscali S.p.A. for a exible and scalable LtR solution for learning ranking models from huge training datasets; ii) providing the IR research community with a exible, extensible and ecient LtR framework to design LtR solutions and fairly compare the performance of dierent algorithms and ranking models. This paper presents our choices in designing QuickRank and report some preliminary use experiences

    online convergent scheduling: un approccio per lo scheduling di job batch su griglia

    No full text
    progettazione e valutazione di un sistema per la gestione della fase di scheduling di uno stream di job con vincoli multipli non noto a priori su una griglia dedicata all'utility computing. La gestione e' effettuata con la tecnica del convergent scheduling, fornendo una priorita' indipendente a ogni possibile allocazione job-machina e utilizzando poi una procedura efficente ed ottimizzata per il matching final

    Adaptive Collision Culling for Massive Simulations by a Parallel and Context-Aware Sweep and Prune Algorithm

    No full text

    A JOB SCHEDULING FRAMEWORK FOR LARGE COMPUTING FARMS

    No full text
    In this paper, we propose a new method, called Convergent Scheduling, for scheduling a continuous stream of batch jobs on the machines of large-scale computing farms. This method exploits a set of heuristics that guide the scheduler in making decisions. Each heuristics manages a specific problem constraint, and contributes to carry out a value that measures the degree of matching between a job and a machine. Scheduling choices are taken to meet the QoS requested by the submitted jobs, and optimizing the usage of hardware and software resources. We compared it with some of the most common job scheduling algorithms, i.e. Back- filling, and Earliest Deadline First. Convergent Scheduling is able to compute good assignments, while being a simple and modular algorith

    Effective Data Access Patterns on Massively Parallel Processors

    No full text
    \ua9 2014 John Wiley & Sons, Inc. The new generation of microprocessors incorporates a huge number of cores on the same chip. Graphics processing units are an example of this kind of architectures. This chapter discusses the characteristics and the issues of the memory systems of this kind of architectures. It analyzes these architectures from a theoretical point of view using the K-model to estimate the complexity of a given algorithm defined on this computational model. The chapter describes how the K-model can be used to design efficient data access patterns for implementing efficient GPU algorithms. It introduces some preliminary details of many-core architectures, describes the K-model, analyzes the two applications, parallel prefix sum and bitonic sorting networks, by means of the K-model. Finally, the chapter concludes that experiments conducted demonstrates that the K-model could be fruitfully exploited to design efficient algorithms for computational platforms with many cores

    Efficient Diversification of Search Results using Query Logs

    No full text
    We study the problem of diversifying search results by exploiting the knowledge mined from query logs. Our proposal exploits the presence of different “specializations ” of queries in query logs to detect the submission of ambiguous/faceted queries, and manage them by diversifying the search results returned in order to cover the different possible interpretations of the query. We present an original formulation of the results diversification problem in terms of an objective function to be maximized that admits the finding of an optimal solution in linear time

    A Multilevel Scheduler for Batch Jobs on Grids

    No full text
    This paper proposes a two-level scheduler for dynamically scheduling a continuous stream of sequential and multi-threaded batch jobs on grids, made up of interconnected clusters of heterogeneous single-processor and/or symmetric multiprocessor machines. The scheduler aims to schedule arriving jobs respecting their computational and deadline requirements, and optimizing the hardware and software resource usage. At the top of the hierarchy a lightweight meta-scheduler (MS) classifies incoming jobs according to their requirements, and schedules them among the underlying resources balancing the workload. At cluster level a Flexible Backfilling algorithm carries out the job machine associations by exploiting dynamic information about the environment. Scheduling decisions at both levels are based on job priorities computed by using different sets of heuristics. The different proposals have been compared through simulations. Performance figures show the feasibility of our approach
    corecore